library(tidyverse)
violence <- read_csv("https://github.com/databrew/maputo/blob/master/day3/data/moz.csv?raw=true")
Click on the data to look at. What does this data appear to be about?
Make an object called x. This will be all the violent events in Mozambique grouped by the type of event (event_type) and then tallied (ie, counted).
Look at x. Which type of violence is most common?
Make a barplot of the number of different types of violence.
Make a histogram of the number of fatalities.
Make a new object called y. This will be the violence data, grouped by year and event_type, and then summarized to get the total number of fatalities.
Plot the total number of fatalities by event type. This should be a barchart of y with year on the x-axis, the total number of fatalities on the y-axis, and the fill of the bars reflecting the event type.
Create an object called z. This will be the violence data grouped by three variables: (a) year, (b) admin1 (the province) and (c) the event type. After grouping, use summarise to calculate the average number of fatalities for these kinds of incidents.
Create a barplot of z with year on the x-axis, the total number of fatalities on the y-axis, and the fill of the bars reflecting the event type. Use facet_wrap() to make different panels for each province.
Create an object called w which will be the total number of fatalities by year and by province.
Make a line chart of w, panelled by province.
Make a violin chart of the violence data in which year is on the x-axis and fatalities is on the y-axis. You’ll need to add group=year to within the aes() argument.
Make the above chart panelled by admin1.
library(tidyverse)
weather <- read_csv("https://github.com/databrew/maputo/blob/master/day3/data/mozambican_weather.csv?raw=true")
Take a look at the top of the data.
Create an object called x. This will be the weather data, then summarized to tell us the minimum and maximum dates.
Create an object called y. This will be the weather data, then summarized to tell us the maximum temp_max and the mimimum temp_min.
Create an object called z. This will be the weather data, but filtered to only include the year 2014 and only the district of “MATOLA”.
Make a plot of z with the date on the x-axis and the average temperature (temp) on the y-axis.
Make a plot of z showing the distribution of precipitation.
Make a plot showing the distribution of the maximum temperature, but faceted by district.
Color the lines in the above red.
Add another distribution the above chart with the minimum temperature. Color it blue. To do this you will need to add another geom_density line to your chart, and you can include another aes() section within that geom_density function.
library(tidyverse)
data('swiss')
swiss <- swiss
Look at the cabeça of the data. Describe it.
Make a chart showing the association between Agriculture rate and Fertility rate. This should be a scatterplot. You’ll use geom_point.
Describe the association.
Add the argument geom_smooth() to the above chart. What happens?
Run the following code to create a variable called canton from the row.names of the data:
swiss$canton <- row.names(swiss)
Create an object called x. This will be the swiss data, but only keeping those cantons which are more than 80% catholic.
How many rows are in x?
Go back to your swiss data. Is there an association between Education and Catholicism? Show it in a chart.
Create a variable a new variable in swiss called edu, which indicates whether the canton is highly-educated (at or above the average) or has low education (below the average). Here’s the code to do this:
swiss <- swiss %>%
mutate(edu = ifelse(Education >= mean(Education),
"High education",
"Low education"))
Create a variable in swiss called fert. This variable should show whether the canton has high fertility or low fertility. Use the code in the previous question as a guide.
Create an object called y. This will be the swiss data, grouped by both your new edu and fert variables, and then summarized to tell the average infant mortality rate for each subgroup.
Make a barplot of the data in y. Add the argument position = 'dodge to within geom_bar() to make the bars not stack
Interpret the above chart. Which group has the highest infant mortality? Which one has the lowest?
Arrange the data by infant mortality. Which canton has the lowest?
What is the maximum infant mortality for all Swiss canons?
What is the maximum infant mortality for canons with high education?
What is the average infant mortality for canons with high fertility.
Make a plot showing the relationship between fertility (on the x-axis) and infant mortality (on the y-axis).
Add color = edu to within the aes() section of the above chart.
Make a “density” plot of fertility.
Create a new variable in swiss using mutate. The new variable will be named ag. This should be whether a canton is very agricultural or not (ie, more than 50%). To do this:
swiss <- swiss %>%
mutate(ag = ifelse(Agriculture > 50, 'Very agricultural',
'Not very agricultural'))
Create an overlapping density chart of infant mortality in which the fill shows whether a place is very agricultural or not.
Create a new variable in your data called cath. This should whether or not a canton is very Catholic (ie, > 50%).
Create an overlapping density chart of fertility in which the fill is whether a place is very Catholic or not.
Use ggridges to create a ridgeline chart of the distribution of infant mortality by your cath variable (here’s the code to help)
library(ggridges)
ggplot(data = swiss,
aes(x = Infant.Mortality,
y = cath)) +
ggridges::geom_density_ridges()
Create a ridgeline chart (like above) showing fertility instead of infant mortality.
Which areas have more children: Catholic or non-Catholic areas?
Use the cut function to generate a 4-category categorical variable of the Education variable. Call this new variable educat. Note that cut takes two arguments: first, the variable you are cutting; second, the number of categories you want.
Make a ridgeline plot of infant mortality’s distribution by your new educat variable.
Can you create the following plot? You’ll need to add facet_wrap(~educat)…
ggplot(data = swiss,
aes(x = Catholic,
y = Infant.Mortality,
size = Fertility)) +
facet_wrap(~educat) +
geom_point(alpha = 0.6)
What percentage of high-Catholicism cantons are agricultural?
What percentage of low fertility canons are Catholic?
frangos.data('ChickWeight')
frangos <- ChickWeight
2 How many columns does the data have?
3 How many rows does the data have?
4 What are the variable types (quantitative/numeric or categorical)?
5 Create a point chart showing weight on the x-axis and time on the y-axis.
6 Create an object called frango1. This should be just the data for chicken number 1 (ie, 1 in the Chick column).
7 Chart the weight of chicken 1 over time using geom_point()
8 Chart the weight of chicken 1 over time using geom_line()
9 Chart the weight of chicken 1 over time using geom_area()
10 Chart the weight of all chickens over time using geom_line(). Make the color of each line different for each chicken.
11 Create an object called zero. This should be the frangos only at day 0 (ie, when they are born).
12 Make a histogram of chicken’s weights at day 0.
13 Make a density plot of chicken’s weights at day 0.
14 Make a violin plot of chicken’s weights at day 0. On the x-axis, put the Diet type. On the y-axis, put the weight.
15 Add points to the violin plot
1 Run the following to get some data on “unidades sanitarias” on your computer:
us <- read_csv("https://raw.githubusercontent.com/databrew/databrew.github.io/master/us.csv")
2 Look at the head of the data.
3 Make a scatterplot of the us data, with longitude on the x-axis and latitude on the y-axis.
4 Make an object called province. This should be the number of unidades sanitarias per province.
5 Make an object called district. This should be the number of unidades sanitarias per district.
6 Create an object called types. This should be the number of unidades sanitarias per type
7 Make a barplot of the number of unidades sanitarias by type.
8 Add coord_flip to the code to make the plot horizontal instead of vertical.
9 Which type of health post is most common?
10 Make another a scatterplot of the us data, with longitude on the x-axis and latitude on the y-axis. But make the color of each point be a reflection of the province
install.packages('sp')
install.packages('maps')
install.packages('leaflet')
install.packages('rgdal')
install.packages('rgeos')
install.packages('leaflet.extras')
install.packages('maptools')
install.packages('raster')
install.packages('ggthemes')
moz3.library(tidyverse)
library(raster)
library(sp)
library(ggthemes)
moz3 <- getData(country = 'MOZ', level = 3)
moz3 object.ggplot(data = moz3,
aes(x = long,
y = lat,
group = group)) +
geom_polygon()
## Regions defined for each Polygons
Use the color argument in the geom_polygon to draw some borders.
Use the fill argument in the geom_polygon to change the color of the inside of the polygons
Add a theme to the map, for example, theme_map()
Use the labs argument to add a title, subtitle and caption
## Regions defined for each Polygons
us <- read_csv("https://raw.githubusercontent.com/databrew/databrew.github.io/master/us.csv")
## Regions defined for each Polygons
## Warning: Removed 67 rows containing missing values (geom_point).
install.packages('sp')
install.packages('maps')
install.packages('leaflet')
install.packages('rgdal')
install.packages('rgeos')
install.packages('leaflet.extras')
install.packages('maptools')
install.packages('raster')
install.packages('ggthemes')
library(leaflet)
library(leaflet.extras)
Go to your web browser and look at the different “provider” tiles for leaflet: https://leaflet-extras.github.io/leaflet-providers/preview/. Pick one that you like and write it down (for example, “Thunderforest.SpinalMap” or “OpenStreetMap.Mapnik”).
Make an object called mytile. Assign to this the name of your favorite tile. For example:
mytile <- "OpenStreetMap.Mapnik"
library(leaflet)
library(leaflet.extras)
leaflet() %>%
addProviderTiles(mytile)
us <- read_csv("https://raw.githubusercontent.com/databrew/databrew.github.io/master/us.csv")
leaflet() %>%
addProviderTiles(mytile) %>%
addMarkers(data = us)
## Assuming "longitude" and "latitude" are longitude and latitude, respectively
## Warning in validateCoords(lng, lat, funcName): Data contains 67 rows with either
## missing or invalid lat/lon values and will be ignored
leaflet() %>%
addProviderTiles(mytile) %>%
addCircleMarkers(data = us,
color = 'red', radius = 0.1)
## Warning in validateCoords(lng, lat, funcName): Data contains 67 rows with either
## missing or invalid lat/lon values and will be ignored
leaflet() %>%
addProviderTiles(mytile) %>%
addCircleMarkers(data = us,
color = 'red', radius = 0.1,
popup = ~name)
## Assuming "longitude" and "latitude" are longitude and latitude, respectively
## Warning in validateCoords(lng, lat, funcName): Data contains 67 rows with either
## missing or invalid lat/lon values and will be ignored
leaflet() %>%
addProviderTiles(mytile) %>%
addCircleMarkers(data = us,
color = 'red', radius = 0.1,
popup = ~name,
clusterOptions = markerClusterOptions())
## Warning in validateCoords(lng, lat, funcName): Data contains 67 rows with either
## missing or invalid lat/lon values and will be ignored
library(tidyverse)
library(rgdal)
library(raster)
moz0.library(raster)
moz0 <- getData(country = 'MOZ', level = 0)
leaflet() %>%
addProviderTiles("Esri.WorldImagery") %>%
addPolygons(data = moz0)
moz1 <- getData(country = 'MOZ', level = 1)
leaflet() %>%
addProviderTiles("Esri.WorldImagery") %>%
addPolygons(data = moz1)
gaza <- moz1[moz1$NAME_1 == 'Gaza',]
leaflet() %>%
addProviderTiles("Esri.WorldImagery") %>%
addPolygons(data = gaza)
us <- read_csv("https://raw.githubusercontent.com/databrew/databrew.github.io/master/us.csv")
gaza_us <- us %>% filter(province == 'GAZA')